AITopics | linkage function

Collaborating Authors

linkage function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Linkage Clustering Beyond Binary Trees and Ultrametrics

Dreveton, Maximilien, Grossglauser, Matthias, Kuroda, Daichi, Thiran, Patrick

arXiv.org Machine LearningNov-25-2025

Hierarchical clustering seeks to uncover nested structures in data by constructing a tree of clusters, where deeper levels reveal finer-grained relationships. Traditional methods, including linkage approaches, face three major limitations: (i) they always return a hierarchy, even if none exists, (ii) they are restricted to binary trees, even if the true hierarchy is non-binary, and (iii) they are highly sensitive to the choice of linkage function. In this paper, we address these issues by introducing the notion of a valid hierarchy and defining a partial order over the set of valid hierarchies. We prove the existence of a finest valid hierarchy, that is, the hierarchy that encodes the maximum information consistent with the similarity structure of the data set. In particular, the finest valid hierarchy is not constrained to binary structures and, when no hierarchical relationships exist, collapses to a star tree. We propose a simple two-step algorithm that first constructs a binary tree via a linkage method and then prunes it to enforce validity. We establish necessary and sufficient conditions on the linkage function under which this procedure exactly recovers the finest valid hierarchy, and we show that all linkage functions satisfying these conditions yield the same hierarchy after pruning. Notably, classical linkage rules such as single, complete, and average satisfy these conditions, whereas Ward's linkage fails to do so.

hierarchy, linkage, valid hierarchy, (15 more...)

arXiv.org Machine Learning

2511.18056

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(2 more...)

Genre:

Workflow (0.67)
Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Order preserving hierarchical agglomerative clustering

Bakkelund, Daniel

arXiv.org Machine LearningNov-17-2020

We present a method for hierarchical clustering of directed acyclic graphs and other strictly partially ordered data that preserves the data structure. In particular, if we have $a

dendrogram, partial dendrogram, relation, (15 more...)

arXiv.org Machine Learning

2004.12488

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

Han, Xin, Zhu, Ye, Ting, Kai Ming, Li, Gang

arXiv.org Artificial IntelligenceOct-12-2020

Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches. Existing AHC methods, which are based on a distance measure, have one key issue: it has difficulty in identifying adjacent clusters with varied densities, regardless of the cluster extraction methods applied on the resultant dendrogram. In this paper, we identify the root cause of this issue and show that the use of a data-dependent kernel (instead of distance or existing kernel) provides an effective means to address it. We analyse the condition under which existing AHC methods fail to extract clusters effectively; and the reason why the data-dependent kernel is an effective remedy. This leads to a new approach to kernerlise existing hierarchical clustering algorithms such as existing traditional AHC algorithms, HDBSCAN, GDL and PHA. In each of these algorithms, our empirical evaluation shows that a recently introduced Isolation Kernel produces a higher quality or purer dendrogram than distance, Gaussian Kernel and adaptive Gaussian Kernel.

algorithm, dendrogram, linkage function, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.patcog.2023.109517

2010.05473

Country:

Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Kansas (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Compact Representation of Uncertainty in Hierarchical Clustering

Greenberg, Craig S., Macaluso, Sebastian, Monath, Nicholas, Lee, Ji-Ah, Flaherty, Patrick, Cranmer, Kyle, McGregor, Andrew, McCallum, Andrew

arXiv.org Machine LearningFeb-26-2020

Hierarchical clustering is a fundamental task often used to discover meaningful structures in data, such as phylogenetic trees, taxonomies of concepts, subtypes of cancer, and cascades of particle decays in particle physics. When multiple hierarchical clusterings of the data are possible, it is useful to represent uncertainty in the clustering through various probabilistic quantities. Existing approaches represent uncertainty for a range of models; however, they only provide approximate inference. This paper presents dynamic-programming algorithms and proofs for exact inference in hierarchical clustering. We are able to compute the partition function, MAP hierarchical clustering, and marginal probabilities of sub-hierarchies and clusters. Our method supports a wide range of hierarchical models and only requires a cluster compatibility function. Rather than scaling with the number of hierarchical clusterings of $n$ elements ($\omega(n n! / 2^{n-1})$), our approach runs in time and space proportional to the significantly smaller powerset of $n$. Despite still being large, these algorithms enable exact inference in small-data applications and are also interesting from a theoretical perspective. We demonstrate the utility of our method and compare its performance with respect to existing approximate methods.

algorithm, hierarchy, partition function, (15 more...)

arXiv.org Machine Learning

2002.11661

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Scalable Hierarchical Clustering with Tree Grafting

Monath, Nicholas, Kobren, Ari, Krishnamurthy, Akshay, Glass, Michael, McCallum, Andrew

arXiv.org Machine LearningDec-31-2019

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.

algorithm, grinch, linkage function, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3292500.3330929

2001.00076

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Supervised Hierarchical Clustering with Exponential Linkage

Yadav, Nishant, Kobren, Ari, Monath, Nicholas, McCallum, Andrew

arXiv.org Machine LearningJun-18-2019

In supervised clustering, standard techniques for learning a pairwise dissimilarity function often suffer from a discrepancy between the training and clustering objectives, leading to poor cluster quality. Rectifying this discrepancy necessitates matching the procedure for training the dissimilarity function to the clustering algorithm. In this paper, we introduce a method for training the dissimilarity function in a way that is tightly coupled with hierarchical clustering, in particular single linkage. However, the appropriate clustering algorithm for a given dataset is often unknown. Thus we introduce an approach to supervised hierarchical clustering that smoothly interpolates between single, average, and complete linkage, and we give a training procedure that simultaneously learns a linkage function and a dissimilarity function. We accomplish this with a novel Exponential Linkage function that has a learnable parameter that controls the interpolation. In experiments on four datasets, our joint training procedure consistently matches or outperforms the next best training procedure/linkage function pair and gives up to 8 points improvement in dendrogram purity over discrepant pairs.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Machine Learning

1906.07859

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Robust Clustering for Time Series Using Spectral Densities and Functional Data Analysis

Rivera-García, Diego, García-Escudero, Luis Angel, Mayo-Iscar, Agustín, Ortega, Joaquín

arXiv.org Machine LearningFeb-7-2017

In this work a robust clustering algorithm for stationary time series is proposed. The algorithm is based on the use of estimated spectral densities, which are considered as functional data, as the basic characteristic of stationary time series for clustering purposes. A robust algorithm for functional data is then applied to the set of spectral densities. Trimming techniques and restrictions on the scatter within groups reduce the effect of noise in the data and help to prevent the identification of spurious clusters. The procedure is tested in a simulation study, and is also applied to a real data set.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1702.02165

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.89)

Add feedback

Ethnicity sensitive author disambiguation using semi-supervised learning

Louppe, Gilles, Al-Natsheh, Hussein, Susik, Mateusz, Maguire, Eamonn

arXiv.org Machine LearningMay-4-2016

Author name disambiguation in bibliographic databases is the problem of grouping together scientific publications written by the same person, accounting for potential homonyms and/or synonyms. Among solutions to this problem, digital libraries are increasingly offering tools for authors to manually curate their publications and claim those that are theirs. Indirectly, these tools allow for the inexpensive collection of large annotated training data, which can be further leveraged to build a complementary automated disambiguation system capable of inferring patterns for identifying publications written by the same person. Building on more than 1 million publicly released crowdsourced annotations, we propose an automated author disambiguation solution exploiting this data (i) to learn an accurate classifier for identifying coreferring authors and (ii) to guide the clustering of scientific publications by distinct authors in a semi-supervised way. To the best of our knowledge, our analysis is the first to be carried out on data of this size and coverage. With respect to the state of the art, we validate the general pipeline used in most existing solutions, and improve by: (i) proposing phonetic-based blocking strategies, thereby increasing recall; and (ii) adding strong ethnicity-sensitive features for learning a linkage function, thereby tailoring disambiguation to non-Western author names whenever necessary.

artificial intelligence, disambiguation, machine learning, (18 more...)

arXiv.org Machine Learning

1508.07744

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Anytime Hierarchical Clustering

Arslan, Omur, Koditschek, Daniel E.

arXiv.org Machine LearningApr-13-2014

We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed /parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.

data mining, linkage, machine learning, (18 more...)

arXiv.org Machine Learning

1404.3439

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Discerning Linkage-Based Algorithms Among Hierarchical Clustering Methods

Ackerman, Margareta (University of Waterloo) | Ben-David, Shai (University of Waterloo)

AAAI ConferencesJul-19-2011

Selecting a clustering algorithm is a perplexing task. Yet since different algorithms may yield dramatically different outputs on the same data, the choice of algorithm is crucial. When selecting a clustering algorithm, users tend to focus on cost-related considerations (software purchasing costs, running times, etc). Differences concerning the output of the algorithms are not usually considered. Recently, a formal approach for selecting a clustering algorithm has been proposed (Ackerman, Ben-David, and Loker, NIPS 2010). The approach involves distilling abstract properties of the input-output behavior of different clustering paradigms and classifying algorithms based on these properties. In this paper, we extend this approach into the hierarchical setting. The class of linkage-based algorithms is perhaps the most popular class of hierarchical algorithms. We identify two properties of hierarchical algorithms, and prove that linkage-based algorithms are the only ones that satisfy both of these properties. Our characterization clearly delineates the difference between linkage-based algorithms and other hierarchical algorithms. We formulate an intuitive notion of locality of a hierarchical algorithm that distinguishes between linkage-based and "global" hierarchical algorithms like bisecting k-means, and prove that popular divisive hierarchical algorithms produce clusterings that cannot be produced by any linkage-based algorithm.

algorithm, dendrogram, hierarchical algorithm, (14 more...)

AAAI Conferences

Twenty-Second International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback